#--------------------------------------------------------------------------------#
# Authors: 	Diana Da In Lee, Yamil R. Velez
# Title: 	Measuring Descriptive Representation at Scale: Methods for Predicting the Race and Ethnicity of Public Officials
# Date: 	2024-07-01
# Copyright (c) 2021, under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
#  For more information see: http://creativecommons.org/licenses/by-nc-sa/3.0/us/
#  All rights reserved. 
#--------------------------------------------------------------------------------#


#--------------------------------------------------------------------------------#
# install R and necessary packages
#--------------------------------------------------------------------------------#

Below provides information of the versions R and Mac OS-X that the programs were run as well as R packages necessary to reproduce the results:

R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 14.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] stargazer_5.2.3   nnet_7.3-18       ggpubr_0.6.0      magrittr_2.0.3    data.table_1.15.4 zoo_1.8-12        ROCR_1.0-11       rgl_1.2.8         plot3D_1.4       
[10] geometry_0.4.7    lubridate_1.9.3   forcats_1.0.0     stringr_1.5.1     dplyr_1.1.4       purrr_1.0.2       readr_2.1.5       tidyr_1.3.1       tibble_3.2.1     
[19] ggplot2_3.5.1     tidyverse_2.0.0  

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1  xfun_0.44         lattice_0.20-45   carData_3.0-5     tcltk_4.2.2       colorspace_2.1-0  vctrs_0.6.5       generics_0.1.3    htmltools_0.5.8.1
[10] base64enc_0.1-3   utf8_1.2.4        rlang_1.1.4       pillar_1.9.0      glue_1.7.0        withr_3.0.0       lifecycle_1.0.4   munsell_0.5.1     ggsignif_0.6.4   
[19] gtable_0.3.5      htmlwidgets_1.6.2 misc3d_0.9-1      knitr_1.47        magic_1.6-1       tzdb_0.4.0        fastmap_1.2.0     extrafont_0.19    fansi_1.0.6      
[28] Rttf2pt1_1.3.11   broom_1.0.6       Rcpp_1.0.12       scales_1.3.0      backports_1.5.0   jsonlite_1.8.8    abind_1.4-5       hms_1.1.3         digest_0.6.35    
[37] stringi_1.8.4     rstatix_0.7.2     grid_4.2.2        cli_3.6.3         tools_4.2.2       car_3.1-2         extrafontdb_1.0   pkgconfig_2.0.3   timechange_0.3.0 
[46] rstudioapi_0.16.0 R6_2.5.1          compiler_4.2.2   


## -------------------------------------------------- #
# file folder descriptions
## -------------------------------------------------- #

1. Data			
	Census		: Census data
		- census_plc.Rdata		: cleaned raw census data, created by census_plc.R
	
	Prediction	: predictions
		
		- img5_weighted_opencv_cov2v_nofw_level.rds	: predictions merged with election data
	
2. Output		: includes intermedaite data set
	- est_img5_weighted_opencv_cov2f_nofw_level.Rdata	: regression result, created in Figure 4.R
	- est_sum_img5_weighted_opencv_cov2f_nofw_level.Rdata	: Figure 4 regression result, created in Figure 4.R
	- val_error.csv			: classification error rate results, created in Table 2.R
3. Figures		: includes figures
4. Tables		: includes tables


## -------------------------------------------------- #
# R scripts
## -------------------------------------------------- #

All R scripts that generates outputs are included in the main folder. The R scripts required to reproduce outputs in the main manuscripts are:
	- Figure 1.R
	- Figure 2.R
	- Figure 3.R
	- Figure 4.R
	- Table 2.R

The order in which R scripts should be run does not matter.


## -------------------------------------------------- #
# additional notes 
## -------------------------------------------------- #

1. Census Data		: the original source is from NHGIS which prohibits distribution of data. The provided dataset therefore subsets the original data to include relevant variables.

2. Replication materials for the analyses in the appendix can be provided upon request. 

## -------------------------------------------------- #
# contact info 
## -------------------------------------------------- #

For any questions, contact Diana Da In Lee at dl2860@columbia.edu






